-
Notifications
You must be signed in to change notification settings - Fork 54
Allow DAGs to silence only errors matching predicate #654
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question for airflow deployment stuff: when merging this PR change, what is the process for handling the change in the variable interface? These changes are not backwards compatible with the existing variable interface, so this dag would crash on a type error (I think). Is the process to first turn the DAG off in production, then in any order, merge this PR and update the variable to the new interface, then re-enable the DAG after the new DAG code has been pulled?
In general, yes. In this case this would mean turning off Because we don't have the variable set up yet, we should be fine to simply merge this as a fast-follow -- but I'll update the PR description to make sure we double-check that before merging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I was worried we'd have to match on exception text, which would be an issue for things like KeyError
s where the error text is only the missing key and nothing else 😶 Hard to know that ahead of time! But since we're doing a simple substring match and we include the exception type in the slack message, we can match that too 🥳 I used KeyError
as my errors
value and raised a KeyError
within the DAG, it worked exactly as expected 🚀
I have one suggestion regarding string comparisons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎊 🤖
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
Fixes
Fast follow from #643 to implement a suggestion by @sarayourfriend
check_silenced_dags
, any silenced DAGs) are running while we deploy these changes and update the variable interface. Discussion.Description
#643 introduced an Airflow variable (
silenced_slack_alerts
) that allows configuring a DAG to skip Slack reporting of errors. The implementation in that PR only allows for skipping Slack reporting for all error messages, but in reality we're unlikely to want this behavior. It's much safer to be able to silence specific known errors.This PR changes the shape of the
silenced_slack_alerts
:For each
dag_id
you must now provide both anissue
and a list oferrors
. A slack alert will not be sent if it matches any of the predicates in theerrors
list for the given DAG.Because our on_failure_callback includes information like the ExceptionType and task id in the message, we can use this to selectively silence messages matching those parameters as well.
Testing Instructions
silenced_slack_alerts
in the Admin > Variables UI. For my tests I used this configuration:Skipping Slack alert for <dag_id>
.errors
configuration to test that errors that do not match any of the predicates are still reported to Slack.Now test
check_silenced_dags
DAG:Checklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin